A Model for Learning Words in a Language by Crawling the Web
نویسندگان
چکیده
A model for an Internet web crawler with a very limited vocabulary can be devised to learn most words in the English language. The system will have the ability to read a sentence where only constituents of the sentence are known. In order to achieve this, the system will provide a methodology to resolve ambiguities within the unknown constituent words and parts of speech. The system will include a lexicon of word types, nouns, verbs, adjectives, and adverbs to be learned, seeded with 100 random words. The source material being read need not be domain specific. As the overall lexicon of the system improves and grows, domain-specific and technically advanced jargon can be more readily handled. Some categories of word types such as determiners (the, a, each, all, etc.), conjunctions (and, or, but) and prepositions (by, along, etc.) can easily be exhaustively enumerated. An algorithm will be used in the process of determining the nature of unknown words where a lexicon of English, nouns, verbs and adjectives is large.
منابع مشابه
The effect of language complexity and group size on knowledge construction: Implications for online learning
This study investigated the effect of language complexity and group size on knowledge construction in two online debates. Knowledge construction was assessed using Gunawardena et al.’s Interaction Analysis Model (1997). Language complexity was determined by dividing the number of unique words by total words. It refers to the lexical variation. The results showed that...
متن کاملPrioritize the ordering of URL queue in Focused crawler
The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...
متن کاملEffective Learning to Rank Persian Web Content
Persian language is one of the most widely used languages in the Web environment. Hence, the Persian Web includes invaluable information that is required to be retrieved effectively. Similar to other languages, ranking algorithms for the Persian Web content, deal with different challenges, such as applicability issues in real-world situations as well as the lack of user modeling. CF-Rank, as a ...
متن کاملImpact of Using Web-quests on Learning Vocabulary by Iranian Pre-university Students
Web-quests are internet-based technology applications in which groups of students follow a specific set of steps toward the completion of a final project on a specific subject or a multi-disciplinary subject. The present study aimed to investigate the impacts of using web-quests on learning vocabulary by Iranian pre-university students. The sample of the study consisted of 72 students assigned ...
متن کاملEnglish Teachers Professional Development Needs for Web Development Skills: Meeting the Challenges of Teaching English Language in the Information Age
Utilizing the resources of the web in educational practices has made instructional processes more efficient and interesting and has made the learning process on the other hand much easier and attractive. With the web, English language teachers now have the option of engaging learners in online (web-based) instructions in addition to the use of conventional classroom instructions or alternativel...
متن کاملImpact of Using Web-quests on Learning Vocabulary by Iranian Pre-university Students
Web-quests are internet-based technology applications in which groups of students follow a specific set of steps toward the completion of a final project on a specific subject or a multi-disciplinary subject. The present study aimed to investigate the impacts of using web-quests on learning vocabulary by Iranian pre-university students. The sample of the study consisted of 72 students assigned ...
متن کامل